feat: add Euler CFG++ and Euler-A CFG++ samplers#1354
Conversation
|
I still think it's a good addition to the samplers. I tend to see more coherent generations at relatively higher CFGs, however I may be biased, this requires more testing. |
|
I did some more testing, on a more complex generation there appears to be an increase in difference in the generated images.
The highest similarity in this case comes from 5.5 and 1.3 and it's still just 78%: I can't really say if it's better or not, however given the smooth generation trajectory I think that this method may introduce less artifacts in certain kind of images (I just managed to get it to happen on very few samples, but I'm not that good at writing prompts so I can't really test that aspect so well). |
|
As I've mentioned in #1363, a better strategy could be turning the samplers into implementation objects, so that: if (method == EULER_CFG_PP_SAMPLE_METHOD || method == EULER_A_CFG_PP_SAMPLE_METHOD) {
LOG_WARN("Spectrum requested but not supported for CFG++ samplers");
return;
}would become something like this: if (sampler->works_with_unconditioned_output()) {
LOG_WARN("Spectrum requested but not supported for the %s sampler", sampler->pretty_name().c_str());
return;
}so we keep most sampler-specific quirks inside the sampler class. OTOH, I don't think this implementation should necessarily wait for that refactor; it's not a lot of code, and it's already well tested, so it'd be perfectly fine to adapt it afterwards. |
|
That seems like a good change for maintanability. I've taken a look at your PR and it sure looks cleaner without the massive switch. It also happens that, while writing these samplers, I've also been thinking a lot about how to use the eta parameter to control the noise levels so it's nice to have it implemented. I don't think it would be a problem to adapt this code to your PR even if yours gets merged before this, as the overall structure is quite clear. |
|
I'm converting this to draft until I've reworked the code to be compatible with the latest changes. |
b88df73 to
6be4c50
Compare
|
I finally had some time to work on this and rebase it on the latest master. I'm not entirely sure if this can be considered completely good as it is right now but at least it works again. |
| } | ||
|
|
||
| if (hist.size() == static_cast<size_t>(max_order - 1)) { | ||
| if (hist.size() == static_cast<size_t>(max_order - 1), nullptr) { |
There was a problem hiding this comment.
Because this is a comma expression, the if statement will always evaluate to nullptr (= false). Is that okay?
There was a problem hiding this comment.
I must have mistakenly added it during the refactor, thanks for noticing, I've just fixed that.
There was a problem hiding this comment.
Thanks.
edit: I'm not a Contributor so that plz ignore my comments.
b27e586 to
0653406
Compare
0653406 to
2e1a489
Compare
|
Thank you for your contribution. |
Picks up 8 commits since the previous sync at 90e87bc: 0b82969 docs: add .github/pull_request_template.md 381e0df docs: add CONTRIBUTING.md 0665a7f feat: add hidream o1 image support (leejet#1485) eeac950 fix: Use PkgConfig for WebP and WebM (leejet#1400) 57ff2eb feat: support for memory-mapping model weights (leejet#1414) 9d68341 feat: add Euler CFG++ and Euler-A CFG++ samplers (leejet#1354) 60477fd docs: add new go bindings for stable-diffusion.cpp (leejet#1480) 6ee0684 feat: display server url with "http://" prefix. (leejet#1486) Conflicts, all in src/ggml_extend.hpp: 1. copy_data_to_backend_tensor signature: upstream made gf required (graph-cut needs the segment's graph to restrict uploads); our layer-streaming path needs gf=nullptr so each mini-graph uploads its full backend_tensor_data_map without filtering. Resolution: keep gf optional (default nullptr) and guard the graph_tensor_set filter on gf != nullptr. Upstream's new read_graph_tensor<T> template is added unchanged above copy_data_to_backend_tensor. 2. Tensor-loop null check: upstream added tensor/data null guards and a single ggml_get_name() lookup. Kept both, with our gf-gate layered on top of upstream's set-membership check. 3. alloc_params_buffer: upstream's mmap fast-path (skip allocation when every tensor already has data, since ggml_backend_alloc_ctx_tensors would hit n_buffers==0) and our pinned-host fast-path (allocate weights in the GPU device's host buffer for async H2D under offload) collide on the same function. Resolution: mmap check runs first and returns early — mmapped tensors can't be moved into pinned host memory — then the pinned-host path runs for the non-mmap CPU-params-with-GPU-runtime case, then the original pageable params_backend alloc as the final fallback. Smoke-tested on Z-Image-Turbo Q8 at 512x512: --offload-mode layer_streaming -> 4.0s total (coarse-stage path) --offload-to-cpu --max-vram 4 -> 8.3s total (3 graph-cut segments) HiDream O1 streaming hooks deferred to a follow-up commit.




















This PR adds support for the Euler CFG++ and Euler Ancestral CFG++ samplers: CFG++.
The logic from the code has been adapted from their repository and checked against ComfyUI's implementation and I tried to keep the sampler style as close as possible to the existing ones.
Some changes were needed in
src/stable-diffusion.cppas this specific sampler requires the unconditioned output in order to work.This currently doesn't work with Spectrum cache.
As any CFG++ sampler you must use very low CFG values (for SDXL often less than 2).
I'd be very grateful if anyone could review this, as it's the first sampler I implement that requires this kind of changes.